Journal of Speech, Language, and Hearing Research
● American Speech Language Hearing Association
Preprints posted in the last 90 days, ranked by how well they match Journal of Speech, Language, and Hearing Research's content profile, based on 10 papers previously published here. The average preprint has a 0.00% match score for this journal, so anything above that is already an above-average fit.
Soman, A.; Dev, S. S.; Ravindren, R.
Show abstract
Background Phonemic awareness deficits are a core feature of Specific Learning Disorder-Reading (SLD-R). How task- and language-specific factors influence these deficits in alphasyllabary languages may help clarify the cognitive mechanisms underlying reading impairment in SLD-R. Methods Thirty children with a DSM-5 diagnosis of SLD-R (mean age 11.4 years) and 29 age-matched typically developing children were given phoneme blending (words and pseudowords) and segmentation tasks in Malayalam. The effects of age and consonant clusters on task performance were evaluated. Results Children with SLD-R performed significantly worse than controls across most phonemic awareness tasks, with the largest deficits observed in pseudoword blending and word blending, and smaller deficits in segmentation. No significant difference was observed for initial phoneme deletion. In typically developing children, age showed strong positive correlations with phonemic performance across most tasks, whereas the SLD-R group showed weak or absent correlations, except in word blending and initial phoneme deletion. Consonant clusters significantly affected performance in both groups, with SLD-R showing more severe deficits. Conclusions Phonemic awareness deficits observed in SLD-R in alphasyllabary languages like Malayalam are more prominent in tasks where lexical support is absent, like pseudoword blending. These deficits vary across task types and linguistic complexity. Phonemic awareness improves with age in typically developing children, while improvement is uneven in children with SLD-R. The findings suggest that phonemic awareness deficits are a core feature of SLD-R across languages, but their manifestation is shaped by orthographic and linguistic characteristics of the writing system.
Clarke, N.; Morin, B.; Bedetti, C.; Bogley, R.; Pellerin, S.; Houze, B.; Ramkrishnan, S.; Ezzes, Z.; Miller, Z.; Gorno Tempini, M. L.; Vonk, J. M. J.; Brambati, S. M.
Show abstract
INTRODUCTIONConnected speech analyses can help characterize linguistic impairments in primary progressive aphasia (PPA) and classify variants, however, manual transcription of speech samples is time-consuming and expensive. Automated speech recognition (ASR) may be efficacious for transcribing PPA speech. METHODSTranscripts of picture descriptions (109 PPA, 32 healthy controls (HC)) were generated using a manual, automated (Whisper) or semi-automated approach including a quality control (QC) step. We evaluated transcript accuracy, the reliability of ASR-derived linguistic features, and classification performance. RESULTSWhisper demonstrated lowest error rates for HC, followed by semantic, logopenic and non-fluent PPA variants. Errors correlated with overall disease severity for semantic and logopenic variants. QC of Whisper outputs reduced errors and improved the reliability of linguistic features. Overall, ASR-derived features achieved better classification performance than manual transcription features. DISCUSSIONResults support the use of off-the-shelf ASR for scalable, cost-efficient transcription of PPA speech and classification.
Vonk, J. M. J.; Lian, J.; Cho, C. J.; Antonicelli, G.; Ezzes, Z.; Wauters, L. D.; Keegan-Rodewald, W.; Kurteff, G. L.; Rodriguez, D. A.; Dronkers, N.; Henry, M. L.; Miller, Z. A.; Mandelli, M. L.; Anumanchipalli, G. K.; Gorno-Tempini, M. L.
Show abstract
BackgroundArtificial Intelligence (AI) based approaches to speech analysis have the potential to assist with objective speech error analysis in aphasia but off-the shelf tools often fail to detect speech errors due to prioritizing "fluent transcription." Speech production errors (dysfluencies) are hallmark diagnostic features of the nonfluent (nfvPPA) and logopenic (lvPPA) variants of primary progressive aphasia, yet they can be challenging to detect and characterize even by expert clinicians. This study aimed to evaluate whether the novel automated lightweight Scalable Speech Dysfluency Modeling system (SSDM-L), specifically designed to detect dysfluencies, could accurately distinguish PPA variants using voice recordings of individuals reading a brief passage. MethodParticipants included a total of 104 individuals, 40 with nfvPPA, 40 with lvPPA (matched on disease severity), and 24 healthy controls who read aloud the Grandfather Passage as part of a widely used motor speech evaluation (MSE). We automatically extracted ten speech error (dysfluency) variables using SSDM-L, including insertions, replacements, and deletions at both phoneme- and word-levels, and phoneme-level prolongations and repetitions. Group differences were assessed via ANCOVAs controlling for age, education, and disease severity (MMSE, CDR sum-of-boxes). To test clinical relevance, we performed correlation analyses with MSE ratings provided by experienced speech-language pathologists (i.e., gold standard) within the nfvPPA group. Classification performance was assessed by training random forest and XGBoost machine-learning models including 5-fold cross-validation. ResultsAll individuals read the entire passage in less than five minutes. SSDM-L detected eight of the ten predefined dysfluency features at sufficient frequency to include them in subsequent analyses. All eight features distinguished PPA from controls (p<.006). Individuals with nfvPPA made more errors than the lvPPA group on every feature (all p<.023). Each feature showed a moderate positive correlation with a global MSE apraxia/dysarthria score (r=.31-.56; p<.001-.053). Together, the eight features were able to classify nfvPPA versus lvPPA at AUC=.806 (random forest) and AUC=.776 (XGBoost). DiscussionAI-based automated speech error analysis accurately distinguished nfvPPA and lvPPA variants using a brief reading task. This quick error-sensitive scalable AI system has the potential of providing a practical tool to aid diagnosis in aphasia and motor speech disorders.
Bamberger, R.; Kuhles, G.; Lotter, L. D.; Dukart, J.; Konrad, K.; Guenther, T.; Siniatchkin, M.; Fuchs, M.; von Polier, G.
Show abstract
Background Diagnosis and treatment monitoring of attention-deficit/hyperactivity disorder (ADHD) largely rely on subjective assessments, highlighting the need for objective markers. Voice features and speech embeddings represent promising candidates for such markers, as they may capture alterations in speech production relevant to ADHD. However, it remains unclear which speech features are most informative for distinguishing ADHD and monitoring treatment effects, and which speech tasks most reliably elicit such differences. Methods Twenty-seven children with ADHD and 27 age-matched neurotypical controls completed six speech tasks across two study visits. Children with ADHD were unmedicated at baseline (first visit) and were assessed under prescribed methylphenidate treatment at follow-up, whereas controls underwent repeated assessment without intervention. Established acoustic voice features (eGeMAPS) and high-dimensional speech embeddings (WavLm, Whisper) were extracted and analysed using linear mixed models to examine baseline group differences and group-by-time interaction effects reflecting medication-associated change patterns. Results At baseline, children with ADHD differed significantly from controls in frequency, spectral, and temporal voice features, characterized by lower and more variable pitch, altered spectral properties, and reduced rhythmic stability. Group-by-time interaction effects indicated medication-associated modulation in the ADHD group, including reduced loudness variability and increased precision of vowel articulation at follow-up, changes not observed in controls. Speech embeddings revealed additional baseline and interaction effects beyond established acoustic features. Free speech tasks, particularly picture description, yielded the most robust and consistent effects. Conclusion Children with ADHD differed from neurotypical controls in vocal features at baseline and showed distinct longitudinal change patterns consistent with medication-related change. These findings support further investigation of speech-based measures as candidate digital phenotypes and potential digital biomarkers in ADHD, with picture description emerging as a particularly promising task for future clinical assessment protocols.
Sharma, S.; Golden, R. M.; Montgomery, J. W.; Gillam, R. B.; Evans, J.
Show abstract
Because both monothetic and polythetic diagnostic classification approaches focus on the presence of individual symptom(s) to identify individuals in a clinical population, they may be diagnostically sensitive clinical markers of multidimensional disorders such as developmental language disorder (DLD). DLD researchers have also used likelihood ratios (LHs) to identify possible diagnostic clinical markers of DLD, however the diagnostic sensitivity of LHs varies markedly across studies. A recent multidimensional computational elastic-net regression examined a total of 71 measures of spoken language and cognitive processing from a cohort of 223 children ages 7;0 to 11;0 with and without DLD (DLD = 110; typically developing (TD) controls = 113). All 200 iterations of the model had high discriminative power (87% - 88%) in positively identifying and distinguishing the DLD participants across all thresholds. Notably, the models identified a sparse DLD-specific deficit profile which only included nine of the 71 measures. In this study, we ask if the individual LHs for each of these nine measures are equally sensitive in identifying and discriminating the children with DLD from TD controls or if diagnostic markers of multidimensional disorders such as DLD can only be identified based on computational modeling approaches. The LHs for each of the nine measures were in the moderately high ranged (3.25 - 10). However, at the the highest LH cut points for each measure, there was little to no overlap in the children each measure identified as having DLD. Follow up analysis revealed that the elastic net model-derived predictive scores for each participant were significantly correlated with the participants language ability. The model also identified a subgroup of TD participants as having the same DLD-deficit profile as the DLD participants. This subgroup were younger, predominantly male participants whose standardized language assessment scores were lower as compared to the larger TD cohort. Taken together, the results from this study show that, because multidimensional modeling approaches such as elastic net regression leverage the variability in the deficit profiles across individual members of a diagnostic group and the unique contributions of each of the behavioral features of the phenotype, they may be an effective tool in deriving diagnostically specific deficit profiles for phenotypically complex, multicausal, multidimensional, neurodevelopmental disorders such as DLD. The results also demonstrate the robustness of the derived DLD-specific deficit profile in identifying individuals with "mild" or subclinical DLD, demonstrating the potential utility of this approach in both clinical and research arenas. What this paper adds.O_ST_ABSWhat is already known on this subject.C_ST_ABSThe identification of diagnostic markers for DLD has been a challenge for both clinicians and researchers across multiple decades. Monothetic classification markers such as non-word repetition, optional infinitive, or syntax dependencies have been explored, as well as polythetic classification approaches where a list of diagnostic symptoms is used together. However, each assumes different criteria and symptoms that should be included as diagnostic markers of DLD. What this study adds.Our study assessed the feasibility and effectiveness of monothetic vs. polythetic classification approaches for identifying DLD. Since our prior work, which used elastic net logistic regression computational modeling with strong discriminatory power, consistently selected nine key features as the DLD-deficit profile, in this effort, we calculated each of the nine features likelihood ratios to examine each measures ability to identify children with DLD. The monothetic approach failed to identify a consistent set of children with DLD, and the polythetic classification approach also did not identify participants who were shown to have mild DLD by the elastic net modeling approach. Instead, our analysis showed that a computational modeling approach, such as elastic net regression, that included small but important input from multiple cognitive and linguistic aspects of children, could better capture multifaceted information about the disorder, better account for individual variability, and consistently identify most participants with DLD. Clinical implications of this study.Elastic net logistic regression identifies a small subset of important features for distinguishing DLD and can assign a probability of DLD presence for each participant. Instead of the polythetic and monothetic approaches commonly used in the field, our study shows that integrating advanced computational modeling, such as elastic net regression, with clinician judgment can better refine assessment processes and address prior and ongoing inconsistencies in the DLD literature and diagnostic practices.
Keshavarzi, M.; Moore, B. C. J.; Goswami, U.
Show abstract
Neural oscillations in the delta (0.5-4 Hz) and theta (4-8 Hz) bands play a key role in tracking the temporal structure of speech. According to Temporal Sampling (TS) theory, dyslexia arises from atypical entrainment of these low-frequency oscillations to speech during infancy and childhood, which is particularly disruptive regarding phonological encoding. However, studies of adults with dyslexia have rarely examined both delta and theta cortical tracking under naturalistic listening conditions, and have not measured delta-band cortical tracking. Using EEG, here we focused on delta and theta band cortical tracking continuous natural speech by adults with and without dyslexia, applying a decoding analysis previously used with dyslexic children. Forty-eight English-speaking adults (24 dyslexic, 24 control) listened to a 16-minute continuous spoken narrative while EEG was recorded. Neural decoding of the speech envelope was quantified using backward multivariate Temporal Response Function (mTRF) models applied at two levels: a between-group analysis evaluating group-level differences in neural representation patterns, and a within-participant analysis assessing individual decoding accuracy. Cerebro-acoustic coherence was computed in parallel to provide a complementary measure of neural-speech synchronisation. Additional analyses examined band power, cross-frequency phase-amplitude coupling (PAC), and cross-frequency phase-phase coupling (PPC). Dyslexic adults exhibited less accurate delta- and theta-band decoding in the between-group analysis and reduced theta-band decoding accuracy in the within-participant analysis, alongside reduced coherence in both bands and increased delta-band power, particularly over the right temporal region. No group differences were found for PAC or PPC. HighlightsO_LIAdults with dyslexia showed reduced delta- and theta-band speech decoding C_LIO_LICerebro-acoustic coherence was reduced in delta and theta bands in dyslexia group C_LIO_LIDelta-band power was increased in dyslexia, especially over right temporal region C_LIO_LICross-frequency coupling did not differ between adults with and without dyslexia C_LI
Muller, B.; Ortiz Barranon, A. A.; Roberts, L.
Show abstract
Dysarthric speech severity assessment typically requires either trained clinicians or supervised machine learning models built from labelled pathological speech data, limiting scalability across languages and clinical settings. We present a training-free method (no supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner) that quantifies dysarthria severity by measuring the degradation of phonological feature subspaces within frozen HuBERT representations. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy control speech, and construct a 12-dimensional phonological profile. Evaluating 890 speakers across10corpora, 5 languages for the full MFA pipeline (English, Spanish, Dutch, Mandarin, French) and 3 primary aetiologies (Parkinsons disease, cerebral palsy, amyotrophic lateral sclerosis), we find that all five consonant d features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2 x 10^-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero), with the effect replicating within individual corpora, surviving FDR correction, and remaining robust to leave-one-corpus-out removal and alignment quality controls. Nasality d decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages) or a model trained from healthy speech alone. It produces clinically interpretable per-feature profiles. We release the full pipeline and phone feature configurations for six languages to support replication and clinical adoption. Author SummaryOne of the authors has lived with ALS for sixteen years. Bernard Muller, who built this entire analytical pipeline using only eye-tracking technology, has experienced the progression of the disease firsthand, including the dysarthric speech that comes with advancing ALS and the tracheostomy that followed. The problem this paper addresses is not abstract to him, and that shapes how the method was designed. We developed a method to measure how well a person with dysarthria can produce distinct speech sounds, without needing any recordings of disordered speech for training. Our approach works by analysing how a widely available AI speech model organises different sound categories -- such as nasal versus oral consonants, or voiced versus voiceless sounds -- and measuring whether those categories become harder to tell apart. We tested this on 890 speakers across 10 datasets in five languages, covering Parkinsons disease, cerebral palsy, and ALS. Because the method only needs healthy speech recordings to set up, it applies to any language with an existing acoustic model, currently covering 29 languages. The resulting profiles show clinicians which specific aspects of speech production are degrading, rather than providing a single opaque severity score. This could support remote monitoring of speech decline in neurodegenerative disease and enable screening in languages and settings where specialist assessment is unavailable.
Ahamdi, S. S.; Fridriksson, J.; Den Ouden, D.
Show abstract
Language impairments in aphasia are characterized by various representational disruptions that may be reflected in discourse production. This research examines the capacity of transformer-based language models, particularly GPT-2, to serve as a computational framework for analyzing variations in aphasic narrative speech. A longitudinal dataset of narrative speech samples collected at six time points from individuals with aphasia (N = 47) was utilized as part of an intervention study. All transcripts were processed via the GPT-2 language model to obtain activation values from each of the 12 transformer layers. Statistically significant differences in activation magnitude across aphasia subtypes were found at every layer (all p < .001), with the most pronounced effects in the deeper layers. Pairwise Tukey HSD tests revealed consistent distinctions between Brocas aphasia and both Anomic and Wernickes aphasia, suggesting a shared activation profile between the latter two. Longitudinal tests revealed significant changes over time, especially in the final three layers (10-12). These findings suggest that transformer-based activation patterns reflect meaningful variation in aphasic discourse and could complement current diagnostic tools. Overall, GPT-2 provides a scalable tool to model representational dynamics in aphasia and enhance the clinical interpretability of deep language models.
Nanda, S.; Gervino, G.; Pang, C. Y.; Garnett, E. O.; Usler, E.; Chugani, D. C.; Chang, S.-E.; Chow, H. M.
Show abstract
Developmental stuttering is a complex neurodevelopmental disorder characterized by disfluent speech. At the individual level, the behavioral manifestations of stuttering vary considerably, likely reflecting heterogeneity in underlying neural mechanisms. In this study, we examined individual-specific differences in the brains of children who stutter (CWS), by implementing normative modeling, a framework that quantifies how an individual deviates from an age- and sex-matched reference population. We applied this approach to identify individual-specific structural brain atypicalities using gray and white matter volumes. These volumes were derived from MRI scans from a large mixed-longitudinal dataset of 235 and 240 scans from CWS and fluent controls respectively, aged between 3 and 12 years. Individual deviation maps capturing these atypicalities were then used to cluster CWS into subtypes based on similarities in their neuroanatomical profiles. This analysis identified four neural subtypes with distinct neuroanatomical atypicalities relative to fluent controls. The key findings were a basal ganglia-thalamo-cerebellar subtype associated with higher stuttering severity and lower rates of recovery, and a white matter subtype characterized by mild severity and a higher likelihood of recovery. The remaining two subtypes showed cerebellar differences alongside alterations in brain regions involved in sensorimotor integration. Moreover, cerebellar volume atypicalities were present in all four subtypes, indicating that cerebellar alterations were present across otherwise distinct neural profiles and may represent a shared neuroanatomical feature of stuttering. These findings indicate that examining individual-specific neural differences and subtyping based on patterns of neural atypicalities provides valuable insight into the heterogeneity of developmental stuttering and represents a promising direction for improving our understanding of the disorder.
Echeverria-Altuna, I.; Demirel, B.; Boettcher, S. E. P.; Watkins, K. E.; Nobre, A. C.
Show abstract
Stuttering involves interruptions to the smooth flow of speech occurring mostly at syllable onset. Speech fluency is enhanced in people who stutter (PWS) by external timing cues. This has been taken to indicate that difficulties in the temporal organisation of action selection and initiation during speech contribute to stuttering. An important unanswered question is whether putative temporal coordination difficulties are specific to speech or generalize to other actions. Here, we examined the temporal organisation of hand action selection in PWS. Twenty PWS and twenty typically fluent speakers (TFS) underwent magnetoencephalography (MEG) recording while performing a visuomotor working-memory task that encouraged temporally specific selection, preparation and shifts between hand actions. Lateralised sensorimotor mu/beta-frequency (8-30 Hz) activity modulation accompanying hand-action prioritisation was weaker in PWS than TFS. Strikingly, this effect was specific to a period of high uncertainty regarding which action to select and when. Despite these differences, behavioural performance was well matched between PWS and TFS, and sensorimotor mu/beta activity was functionally relevant for task performance in both groups. The findings suggest a general disruption of temporal structuring of action selection and preparation in stuttering.
Hsu, C.; Ivaniuk, A.; Jimenez-Gomez, A.; Brunger, T.; Bosselmann, C. M.; Perry, M. S.; Phan, C.; Arenivas, A.; Ludwig, N. N.; Leu, C.; Lal, D.
Show abstract
RationaleNeurodevelopmental disorders (NDDs) are characterised by significant challenges in communication, social interaction, and adaptive function, often impacting quality of life. Previous studies support genetic influences on the communication abilities of individuals with NDD, but were either limited to single genetic conditions or to small cohorts with a limited selection of communication measures. MethodsWe analysed caregiver-reported communication abilities in 79,518 individuals with NDD from the Simons Searchlight and SPARK registries: 4,439 with a CNV-based or monogenic NDD and 75,079 with autism spectrum disorder (ASD) without a known genetic cause (idiopathic ASD) as controls. For analysis, we a priori selected 10 communication-related measures based on their availability in the study cohorts, coverage of distinct communication aspects, and their frequent use in neurodevelopmental phenotyping, yielding 177,328 data points across all study cohorts. The individuals in the Searchlight registry were divided into a Discovery cohort (the 15 most prevalent genetic NDD conditions) and a Confirmation cohort (all other genetic NDD conditions). A second Confirmation cohort was generated using all individuals with genetic ASD forms from the SPARK registry. We then tested each of the three case cohorts and each genetic condition represented in the Discovery cohort against the ASD control cohort. Developmental trajectories were assessed through testing of participants grouped by age at evaluation. ResultsMeasure-level analyses demonstrated significant associations between genetic status and communication abilities, differences in communication abilities between classes of genetic variants (monogenic vs. CNV-based NDDs), and variability between specific genetic NDD conditions. CNV-based NDDs showed milder communication impairment, outperforming idiopathic ASD controls in 9/10 communication measures, whereas monogenic NDD conditions had more pervasive impairments, especially in verbal communication. Although impaired in verbal communication, five monogenic NDD conditions showed at least suggestive strengths in nonverbal and social communication relative to idiopathic ASD controls (CSNK2A1, CTNNB1, SETBP1, MED13L, and PPP2R5D), specifically in using gestures. Developmental trajectory analyses revealed STXBP1 as the gene group at highest risk of developmental stagnation in communication abilities. ConclusionsThese findings underscore the potential of precision speech-language pathology (SLP) approaches tailored to the specific verbal and nonverbal communication strengths and weaknesses of genetic groups. We also provide evidence for measurable improvements and declines in communication abilities with age at the group level, highlighting the need for developmentally informed care. By integrating genetic insights into clinical practice, precision SLP approaches may enhance communication outcomes and developmental progress and improve quality of life for individuals with genetic NDDs.
Mirsharofov, M. M.
Show abstract
BackgroundAutism spectrum disorder (ASD) is frequently associated with speech and language difficulties, yet empirical data from Central Asian countries remain scarce. This study examined the association between a diagnosis of childhood autism (ICD-10: F84.0) and the presence of speech development difficulties in a clinical sample from Tajikistan MethodA retrospective cross-sectional study was conducted using clinical records of 85 patients (36 with F84.0; 49 with other psychiatric diagnoses) at the Insight Mental Health Center in Dushanbe, Tajikistan (December 2025-January 2026). Speech difficulties were identified through systematic review of clinical notes. Between-group comparisons were performed using Pearsons {chi}2 test, odds ratios (OR), relative risk (RR), and effect size measures ({varphi} coefficient, Cohens h). ResultsSpeech difficulties were present in 72.2% of the autism group versus 36.7% of the comparison group. The association was statistically significant ({chi}2 = 10.47, p <.01). Children with autism had substantially higher odds of speech difficulties (OR = 4.48, 95% CI [1.76, 11.38]), with a large effect size (Cohens h = 0.73). ConclusionsAutism diagnosis was significantly associated with elevated rates of speech difficulties in this Tajik clinical sample. Practical implicationsThese findings support the systematic inclusion of speech-language assessment and intervention within autism care protocols, particularly in Central Asian healthcare settings where such integration remains limited. HighlightsO_LISpeech difficulties were identified in 72.2% of children with autism (F84.0) in a Tajik clinical sample. C_LIO_LIChildren with autism were 4.5 times more likely to present with speech difficulties than those with other diagnoses (OR = 4.48, 95% CI [1.76, 11.38]). C_LIO_LIThe most prevalent speech pattern was complete absence of expressive speech (nonverbal presentation). C_LIO_LIFindings support the integration of speech-language assessment into standard autism care protocols in Central Asia. C_LIO_LIThis is one of the first empirical reports on autism and speech profiles from Tajikistan. C_LI
Perugia, E.; Georga, C.
Show abstract
BackgroundAuditory steady-state responses (ASSRs) provide an objective method for estimating hearing thresholds in individuals unable to provide behavioural responses. Bone conduction (BC) testing is required to differentiate conductive from sensorineural hearing loss. Accurate BC ASSR threshold estimation relies on "correction" factors, which are not yet well established. This meta-analysis evaluated the reliability of BC ASSR thresholds to estimate hearing thresholds at 500, 1000, 2000 and 4000 Hz. MethodsA systematic search of PubMed, the Cochrane Library, and Embase was conducted to identify studies involving normal-hearing (NH) and hearing-impaired (HI) participants of all ages. Outcomes were (1) the difference between ASSR behavioural and ASSR thresholds, and (2) ASSR thresholds. The risk of bias was evaluated using the Newcastle-Ottawa Scale. The mean and 95% confidence intervals (CI) were calculated for the thresholds at the four frequencies. The certainty of the evidence was assessed using GRADE approach. ResultsOf records identified, 11 records met the inclusion criteria, yielding a total of 27 studies. Sample sizes ranged from 60 to 249 participants across frequencies and age groups. The quality of records ranged from low to high. Data were synthesised using random-effects models due to heterogeneity. In NH adults, the mean differences ({+/-}95% CI) between BC ASSR thresholds and behavioural thresholds were 17.0 ({+/-}4.8), 15.5 ({+/-}6.0), 13.4 ({+/-}3.3), and 12.1 ({+/-}4.1) dB at 500, 1000, 2000, and 4000 Hz, respectively. In NH infants, mean ({+/-}95% CI) BC ASSR thresholds were 17.2 ({+/-}2.2), 10.5 ({+/-}3.6), 26.4 ({+/-}2.7), and 19.9 ({+/-}4.0) dB HL at the same frequencies. The certainty of the evidence was very low. ConclusionsBC ASSR can be a reliable method for estimating BC thresholds. However, age and frequency significantly impact BC ASSR thresholds, highlighting the need to develop of "correction" factors to accurately predict BC behavioural thresholds. RegistrationPROSPERO CRD42023422150.
Dunham-Carr, K.; Keceli-Kaysili, B.; Markfeld, J. E.; Pulliam, G.; Clark, S. M.; Feldman, J. I.; Santapuram, P.; McClurkin, K.; Agojci, D.; Schwartz, A.; Lewkowicz, D. J.; Woynaroski, T. G.
Show abstract
Differences in looking to and processing of audiovisual speech have been theorized to contribute to heterogeneity in language ability in autistic children. Differential audiovisual speech processing has been indexed by event-related potentials (ERPs), specifically via amplitude suppression in response to audiovisual versus auditory-only speech, and linked with vocabulary in school-aged children. This study used an intact-group comparison and concurrent correlational design in infant siblings of autistic children (Sibs-Autism) and non-autistic children (Sibs-NA) to determine whether amplitude suppression is (a) present in infancy, (b) different in Sibs-Autism versus Sibs-NA, and (c) related to looking to audiovisual speech and language abilities. We collected EEG data from 54 infants aged 12-18 months (29 Sibs-Autism; 25 Sibs-NA) while they viewed videos of audiovisual and auditory-only speech, as well as eye tracking and language data. We found significant amplitude differences at the N2 ERP component in response to audiovisual versus auditory-only speech but no significant group differences in ERP amplitudes. Associations between looking to audiovisual speech, amplitude effects, and language were moderated by group, chronological age, and biological sex. Our findings suggest that differential audiovisual speech processing is present in 12-18-month-olds and may explain heterogeneity in looking to audiovisual speech and emerging language ability.
Sainz-Pardo, M.; Hernandez, M.; Suades, A.; Juncadella, M.; Ortiz-Gil, J.; Ugas, L.; Sala, I.; Lleo, A.; Calabria, M.
Show abstract
Introduction. There is consistent evidence of a disadvantage in bilinguals' speech production compared to monolinguals in healthy individuals, but studies investigating this phenomenon in clinical populations such as Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD) are scarce. Given that both clinical groups are characterized by wordfinding difficulties, understanding how bilingualism influences speech production in these populations is essential. Methods. Early and highly proficient Catalan-Spanish bilinguals (active bilinguals) were compared to Spanish-dominant speakers with low proficiency in Catalan (passive bilinguals) using a picture-naming task. The study included 58 older adults, 66 patients with AD, and 124 individuals with MCI. Reaction times, accuracy, and error types were collected in the naming task in each individual's dominant language. Results. First, active bilinguals demonstrated faster naming latencies than passive bilinguals, particularly for low-frequency words. Second, active bilinguals with MCI exhibited more naming errors than passive bilinguals with MCI, including a higher incidence of crosslanguage intrusions and anomia. Third, passive bilinguals with MCI and AD showed more semantic errors than active bilinguals. Discussion. These findings underscore the impact of second language use on naming performance in MCI and AD. Moreover, they provide insight into the potential mechanisms underlying lexical retrieval differences in bilinguals, including lexico-semantic processing and language control.
Hunter, L. L.; Feeney, M. P.; Fitzpatrick, D.; Keefe, D. H.
Show abstract
ObjectivesThe overall goal of this study was to assess tympanometric and ambient wideband acoustic immittance (WAI) tests and wideband acoustic reflex thresholds (ART) in well-baby and newborn intensive care (NICU) cohorts with three specific objectives: 1) Assess predictive accuracy for WBT and ART for conductive dysfunction in ears referring on the first or second stages of newborn hearing screening; 2) Identify inadequate tests likely due to probe blockages or leaks; and 3) Assess prediction models separately for well-baby and NICU screening outcomes. DesignProspective, observational study of full-term (n=514) and premature newborns (n=239) recruited from well-baby and NICU nursery birth hospital newborn hearing screening program. Wideband tympanometry, ambient absorbance, and acoustic reflexes were tested after Stage 1 transient otoacoustic emissions (TEOAE) screening. The reference standard for Pass or Refer groups was initially defined on the stage 1 TEOAE test result. Pass or Refer groups were then reassigned based on the stage 2 screening ABR for those who referred at Stage 1, and all NICU infants. Multivariate models were developed using reflectance and admittance variables to predict conductive dysfunction relative to the screening reference standard in a randomized sub-group of subjects at Stage 1 and Stage 2 screening. Classification accuracy was evaluated on a second, independent sub-group. Individual tests were classified as having inadequate probe fits if they had excessively low values of sound pressure level or susceptance (leak) or absorbance (blockage). ResultsDifferences in ambient absorbance for Pass v. Refer screening groups revealed the greatest differences and effect sizes occurring in frequency bins between 1.4-2 kHz. Screening failure at both Stage 1 and 2 was most accurately predicted by models using ambient absorbance and power level variables at frequencies between 1-2.8 kHz, including ARTs. Tympanometric admittance variables at the positive-pressure tail for frequencies between 1-2.8 kHz in combination with the ART were more accurate predictors than those at peak pressure or the negative-pressure tail. Multivariate models generalized well to an independent group of infants at both Stage 1 and 2 for both the ambient and tympanometric models. Ambient tests revealed more inadequate tests than tympanometric tests, primarily due to blocked probe tips. Exclusion of ears to detect probe leaks or blockages slightly improved the ambient prediction models, but did not affect tympanometric models. ConclusionWideband acoustic reflex tests improved all models for ambient and tympanometric absorbance. Multivariate prediction models developed for WAI tests were repeatable in an independent group of well and NICU infants, suggesting that the results are generalizable to these populations. Detection of probe blockage or leaks slightly improved prediction for ambient measures. Pressurized tests have the advantage of ensuring probe seals due to the need for a hermetic seal, thus are useful to ensure adequate probe insertion.
Manasevich, V.; Kostanian, D.; Rogachev, A.; Sysoeva, O.
Show abstract
Rise time (RT) is considered to be one of the most significant acoustical characteristics of auditory speech stimuli. A substantial amount of data has been accumulated on the neurophysiological mechanisms of RT processing under different conditions and in different groups of people, but these data have not been systematised. This review focuses on studies that have investigated electroencephalographic (EEG) markers of RT sensitivity. The present literature search was conducted according to the PRISMA statement in PubMed, Web of Science and APA PsychInfo databases. The resultant review comprised 37 studies that considered diverse aspects of RT processing. The review describes the main stimulation parameters affecting electrophysiological markers of RT processing reflected in different components of event-related potentials, brainstem responses and cortical rhythmic activity. The main finding of this review is that the rise time prolongation leads to a decrease in the amplitude of the main ERP components and an increase in their latencies. However, the sensitivity of the EEG markers varied with the earliest components tracking the subtle difference (few tens of microseconds), while the later components coding the larger one (up to 500 ms). Nevertheless, the observed effects may vary and depend on some aspects of the experimental paradigm, age of participants and speech-related problems. Future research may benefit by addressing understudied clinical groups and ERP components such as P1 and N2, dominated in children.
Liu, H.; Betke, M.; Ishwar, P.; Kiran, S.
Show abstract
Individuals with post-stroke aphasia live with long-term disabilities, yet they do not know whether they will improve their communication and cognitive skills over time. We propose a "Therapy Calculator" to provide patients with a better understanding of likely recovery as they engage with therapy. Using a large dataset of rehabilitation outcomes from a digital therapeutic called Constant Therapy (3.5 million therapy sessions of 18,000+ users), we developed a machine learning algorithm that estimates the probability of improvement from one functional landmark (i.e., a given skill level) to the next in a functional domain (e.g., reading) while accounting for age, etiology, starting performance, and frequency and duration of therapy. This logistic regression model performed a binary classification task, i.e., whether patients can improve to the next landmark, with an average F1 score of all models at 0.84, suggesting reliable prediction of moving to the next landmark. Then, we created an online "Therapy Calculator" to assess a new users current functional level and demographic information, and make predictions by passing these features into models trained on relevant subsets of historical data. The findings indicate that our model can provide reliable predictions for patients beginning self-managed SLT, and therapy calculator is publicly available.
Parchure, S.; Gupta, A.; Kelkar, A.; Vnenchak, L.; Faseyitan, O.; Medaglia, J. D.; Harvey, D. Y.; Coslett, H. B.; Hamilton, R. H.
Show abstract
Aphasia, an acquired language deficit, is the most common post-stroke focal cognitive impairment, and roughly 60% cases become chronic (duration >6 months). Aphasia therapies could be optimized if clinicians could make personalized predictions of how individual persons with aphasia (PWA) would be likely to perform on particular language tasks. However, current approaches relying on imaging, lesion volume, patient demographics, and clinical scores achieve less than 50% accuracy in predicting performance in PWA. Research algorithms using complex imaging and fMRI can make binary predictions about the presence or absence of aphasia but do not give more clinically relevant information. We aim to predict word-by-word speech accuracy in PWA to better enable personalized speech therapies. To be clinically informative, machine learning models developed for this purpose should use clinically available inputs, explain key features behind a prediction, and generalize to new PWA and previously unseen words. This study combines multimodal input features from clinical testing scores and structural MRI neuroimaging with a novel data source: word-by-word linguistic difficulty. We computed metrics of cognitive burden, such as semantic selection and recall demands, and articulatory burden, such as word length in phonemes and syllables, using naturalistic corpora containing over a billion words of English text. Retrospective training, ten-fold cross validation and 500-run bootstrapping of different machine learning models with various combinations of input features was conducted using 4620 trials. A simplified version of the best model using widely available inputs was deployed clinically through a web app, and prospective generalization was tested on 570 trials with unseen words and different naming tasks in new PWA. We found the best performances with random forest classifiers using linguistic difficulty combined with either clinical information (AUROC {+/-} SEM = 0.87 {+/-} 0.07), or all together with structural imaging connectivity (0.90 {+/-} 0.04). Classifiers using multimodal inputs significantly outperformed others employing single inputs (range 0.66-0.85, p<0.05). Extracting feature importances from the best model showed that Western Aphasia Battery scores, semantic demands, number of phonemes, and syllables were predictive of PWA speech accuracy. Structural integrity in peri-lesional brain regions predicted better language performance whereas higher connectivity of select contralateral homotopes contributed to prediction of worse speech. Without the inclusion of MRI data, lesion volume was a key predictor of PWA speech as well. A simplified, clinically ready, explainable model (publicly available as AphasiaLENS web application) predicted PWA accuracy for any user-entered word, not restricted to a standardized battery. Its prospective generalization performance was not significantly different from the best model using full inputs (AUROC ranges 0.81-0.89, p>0.05). Thus, our research can help inform individualized treatment planning for PWA, while also suggesting research targets through better understanding of brain-behavior relationships.
Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.
Show abstract
Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.